Methods and apparatus for text detection

ABSTRACT

A text detection technique comprises local ramp detection, identification of intensity troughs (candidate text strokes), determination of stroke width, preliminary detection of text based on contrast and stroke width, and a consistency check.

REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.10/887,940, filed 9 Jul. 2004, now U.S. Pat. No. 7,177,472, which is acontinuation of U.S. application Ser. No. 09/808,791, filed 14 Mar.2001, now U.S. Pat. No. 6,778,700.

BACKGROUND

The invention relates to text that is contained in transmitted pages.More particularly, the invention relates to a method and apparatus forsegmenting a scanned page into text and non-text areas.

Text or pictorial images are often replicated or transmitted by avariety of techniques, such as photocopying, facsimile transmission, andscanning of images into a memory device. The process of replication ortransmission often tends to degrade the resulting image due to a varietyof factors. Degraded images are characterized by indistinct or shiftededges, blended or otherwise connected characters, and distorted shapes.

A reproduced or transmitted image that is degraded in quality may beunusable in certain applications. For example, if the reproduced ortransmitted image is to be used in conjunction with a characterrecognition apparatus, the indistinct edges and/or connected charactersmay preclude accurate or successful recognition of characters in theimage. Also, if the degraded image is printed or otherwise renderedvisible, the image may be more difficult to read and less visuallydistinct.

There are several approaches to improving image quality. One knownresolution enhancement algorithm provides template matching. Templatematching attempts to match a line, curve pattern, or linear pattern andthen tries to find the best way to reconstruct it within the availableprinting resolution.

Other methods for text enhancement come from the area of OpticalCharacter Recognition (OCR). The main purpose of OCR is to isolate thecharacters within a block of text from one another. Such methods aremore related to morphological filters that repetitively performthickening and thinning and opening and closing to get the desiredcharacter shape.

Shiau et al. U.S. Pat. No. 5,852,678 and related European PatentApplication No. EP 0810774 disclose a method and apparatus that improvesdigital reproduction of a compound document image containing half-tonetint regions and text and/or graphics embedded within the half-tone tintregions. The method entails determining a local average pixel value foreach pixel in the image, then discriminating and classifying based onthe local average pixel values, text/graphics pixels from half-tone tintpixels. Discrimination can be effected by calculating a range of localaverages within a neighborhood surrounding each pixel; by calculatingedge gradients based on the local average pixel values; or byapproximating second derivatives of the local average pixel values basedon the local averages. Text/graphics pixels are rendered using arendering method appropriate for that type of pixel; half-tone tintpixels are rendered using a rendering method appropriate for that typeof pixel.

Barski et al. U.S. Pat. No. 5,212,741 discloses a method and apparatusfor processing image data of dot-matrix/ink-jet printed text to performOCR of such image data. In the method and apparatus, the image data areviewed for detecting if dot-matrix/ink-jet printed text is present. Anydetected dot-matrix/ink-jet produced text is then pre-processed bydetermining the image characteristic thereof by forming a histogram ofpixel density values in the image data. A 2-D spatial averagingoperation as a second pre-processing step smooths the dots of thecharacters into strokes and reduces the dynamic range of the image data.The resultant spatially averaged image data is then contrast stretchedin a third pre-processing step to darken dark regions of the image dataand lighten light regions of the image data. Edge enhancement is thenapplied to the contrast stretched image data in a fourth pre-processingstep to bring out higher frequency line details. The edge enhanced imagedata is then binarized and applied to a dot-matrix/ink jet neuralnetwork classifier for recognizing characters in the binarized imagedata from a predetermined set of symbols prior to OCR.

The prior art teaches global techniques aimed at intelligentbinarization, OCR, and document image analysis. It does not teach norsuggest local techniques aimed at text and graphic outlines as opposedto the entire text and graphics region.

It would be advantageous to provide a technique that detects textoutline and line art in a color document image.

It would also be advantageous to provide a technique that provides goodcolor reproduction of document images that contain text.

It would also be advantageous to provide a text detection technique thatis simple and less computationally intensive, i.e., that requires nocomplex feature vectors, no transforms, no color clustering, and nocross-correlation, and thereby is suitable for high resolution scans.

It would also be advantageous to provide a text detection technique thatis local, i.e., that does not require the scanning of an entire documentbefore processing, and that is thereby fast. It would be desirable forprocessing to begin as the document is being scanned. Part of acharacter can be processed without needing the entire character. In suchapproach, neither the text character nor the entire word would berecognized.

It would also be advantageous to provide a text detection technique thatuses adaptive thresholds on text stroke width.

It would also be advantageous to provide a text detection technique thatprovides important information, such as stroke width and backgroundestimate, that may be used for a subsequent text enhancement procedure.

It would also be advantageous to provide a text detection technique thathandles text on light half-tone background.

It would also be advantageous to provide a text detection technique thathandles very thin text blurred by a device, such as by a scanner.

It would also be advantageous to provide a text detection technique inwhich a high local contrast requirement could reduce errors in detectionso that they are not easily perceivable after enhancement.

SUMMARY

A text detection method and apparatus is provided that comprises thefollowing five logical components: 1) local ramp detection; 2)identification of intensity troughs (candidate text strokes); 3)determination of stroke width; 4) preliminary detection of text based oncontrast and stroke width; and 5) a consistency check.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows an example of applying a vertical filter to a 3×3 pixelregion according to the invention;

FIG. 1B is a graph of an intensity trough according to the invention;

FIG. 2 is a schematic diagram of a text stroke according to theinvention;

FIG. 3 is a block schematic diagram of an exemplary embodiment of animage processing system 300 that includes a contrast based processingmodule according to the invention; and

FIG. 4 is a flow diagram of a text detection path that includes acontrast based processing step according to the invention.

DETAILED DESCRIPTION

One goal of an exemplary embodiment of the invention is to segment ascanned page into text and non-text areas, for example, where text issubsequently to be processed differently than the non-text. Another goalof an exemplary embodiment of the invention is to provide input forfurther processing of the detected text areas, such as, for example, bya subsequent text enhancement system that improves the sharpness of textoutlines.

In an exemplary embodiment of the invention, a text detection method andapparatus is provided that comprises the following five logicalcomponents: 1) local ramp detection; 2) identification of intensitytroughs (candidate text strokes); 3) determination of stroke width; 4)preliminary detection of text based on contrast and stroke width; and 5)a consistency check. These components of the invention are discussed indetail below.

The text detection technique disclosed herein (also referred tointerchangeably herein as an algorithm) is based on the observation thattext outlines have very high contrast in return to the background.Therefore, a region is labeled text if, in the region, very strongcontrast in the form of relatively sharp edges is observed, and providedthat the dark side is also close to being neutral in color, i.e., thecolor saturation is small. The text detection technique described hereinis therefore applicable to any of black text on white background, blacktext on color background, and white or light text on dark background(reverse text). The technique described herein typically does not detecthalf-tone text because in this case sharp edges cannot be detectedreliably. However, the fact that the technique described hereintypically does not detect half-tone text is not considered to be adisadvantage because high quality text generally is not half-toned, andit is such high quality text that becomes degraded most as a result ofscanning.

In an exemplary embodiment, gray level, i.e., scanned intensity,information is input to the discussed text detection algorithm becausethe technique herein disclosed depends mainly on contrast for textdetection. In other embodiments of the invention, e.g., using a colorprinter as a target device, two additional pieces of information areused to improve the detection accuracy further. They are: 1) a measureof the color saturation; and 2) a local indicator of the presence ofhalf-tone pixels. These additional pieces of information do not need tobe included for text detection where the intended target device is ablack and white printer.

In the case of a color printer target device, the measure of the colorsaturation is estimated through a preliminary step of single pixelprocessing, provided that color information from the scanner isavailable. Such processing is used mainly to prevent the subsequent textenhancement system mentioned above from enhancing colored text with ablack ink outline. According to the invention herein, a region islabeled text if in the region very strong contrast in the form ofrelatively sharp edges is observed, and provided that the dark side isalso close to being neutral in color, i.e., that the color saturation issmall.

In the case of a gray scale scanner, there is no color information. Inthis case, all dark text is enhanced with a black outline as desired.

The local indicator of the presence of half-tone pixels is obtainedthrough a general algorithm for half-tone detection. The local indicatoris not critical for the functioning of the detection algorithm hereindescribed and so can be waived if not readily available.

In an exemplary embodiment of the invention, regions of a scanned pagethat do not contain text but that contain very strong contrast can alsobe detected as text. This situation does not present a problem becausetypically only a very thin black outline is added to detected text. Forexample, the subsequent text enhancement system discussed above adds avery thin black outline to detected text. Therefore, a high localcontrast requirement and an adding of only a very thin black outlineimplicitly guarantee that errors in text detection are not easilyperceivable after enhancement.

A blurring of scanned text due to scanner resolution limitations tendsto reduce observable local contrast and hence detection accuracy,especially in the case of thin text. This situation presents an issuethat requires explicit treatment in the algorithm discussed belowthrough three pre-processing steps for thin text.

In an exemplary embodiment of the invention, the text detectionalgorithm comprises the following five steps:

1) Local ramp detection;

2) Identification of intensity troughs (candidate text strokes);

3) Determination of stroke width;

4) Preliminary detection of text based on contrast and stroke width; and

5) Consistency check.

Steps 1)-3) are pre-processing steps for thin text, and steps 4)-5) arecontrast based steps. It is also noted that in an alternative exemplaryembodiment of the invention, the pre-processing steps for thin text canbe omitted. That is, only steps 4) (without measuring stroke width) andstep 5) need to be performed.

For each step 1)-5), the means for implementation may vary greatly, andyet remain within the scope of the invention. An exemplaryimplementation for each step is provided below in pseudo-code form,except for step 1), which provides preferred matrices of coefficientsand a mechanism for calculating an output using the matrices. Oneskilled in the art can readily implement the pseudo-code in hardware orsoftware, as preferred.

A detailed description of each step is given in the appropriate sectionherein below.

Step 1: Local Ramp Detection

Scanned intensity values are required as input into this first step.Refer to the discussion of FIG. 1A below for an example of input. Ninedifferent 3×3 high-pass filters are used to detect the presence of steepramps or edges. For purposes of the discussion herein, a filter is awindow of values centered on a pixel, whereby a filtered output isgenerated using all or some of the values of the window. After afiltered output is generated for the current or centered pixel, thewindow (filter) is moved across, typically over to the left or right onepixel, or up or down one pixel, and centers itself on another pixel togenerate a filtered output for that other pixel. The kernels of thesenine filters are depicted in Table A herein below. For purposes of thediscussion herein, a kernel is a matrix of coefficients of a filter,wherein each coefficient is used in calculating or generating a filteredoutput. TABLE A v1 v2 v3 −1 1 0 −1 1 0 0 −1 1 for vertical ramps −1 1 00 −1 1 −1 1 0 −1 1 0 −1 1 0 0 −1 1 h1 h2 h3 −1 −1 −1 −1 0 −1 0 −1 0 forhorizontal ramps 1 1 1 1 −1 1 −1 1 −1 0 0 0 0 1 0 1 0 1 dv1 dv2 1 0 0 −1−1 1 for diagonal ramps (vertical detection) −1 1 0 −1 1 0 −1 −1 1 1 0 0dh1 dh2 1 −1 −1 −1 −1 1 for diagonal ramps (horizontal detection) 0 1 −1−1 1 0 0 0 1 1 0 0Note that dv2 = dh2.

The kernels of the nine filters included in Table A are by no means theonly enabling kernels. In other embodiments of the invention, oneskilled in the art may want to use, for example, fewer kernels, butcreate the same result by having such kernels be more complicated. Thatis, the algorithm is flexible because it allows for preferences andtradeoffs in the choice of kernels. The kernels in Table A were found tobe simple and fast.

In an exemplary embodiment of the invention, a vertical ramp is detectedwhen the filtered output, i.e., the absolute value of any one of v1, v2,or v3, is greater than a threshold value, T_(ramp). FIG. 1A shows anexample of applying a vertical filter to a 3×3 pixel region. Thealgorithm is applied to a text letter A 1. The algorithm is evaluating a3×3 pixel region within the letter A 2. The 3×3 pixel region beingevaluated is enlarged 2 for clarification below the letter A and to theleft. The high pass filter v1 3 is applied to the 3×3 pixel region 2.The v1 filter output is calculated 4 yielding an output of 10. After thealgorithm is finished with region 2, then the filter is shifted to theright and applied to a second region 5. The sign of the outputdetermines the sign of the ramp. Light to dark is negative and dark tolight is positive. The magnitude of the output is quantized in units ofT_(ramp)/3. It is also the output to the next step as the ramp strength.

Similarly, a horizontal ramp is detected when the output, i.e., theabsolute value of any one of h1, h2, or h3, is greater than thethreshold value, T_(ramp). If no vertical or horizontal ramp isdetected, then the filters for diagonal ramp detection are investigated.

A diagonal ramp is detected in the vertical sense if the output, i.e.,the absolute value of dv1 or dv2, is greater than the threshold value,T_(ramp). Similarly, a diagonal ramp is detected in the horizontal senseif the output, the absolute value of dh1 or dh2, is greater than thethreshold value, T_(ramp).

It should be appreciated that the software and hardware implementationof local ramp detection may vary greatly without departing from thescope of the invention claimed herein.

Step 2: Identification of Intensity Troughs

In an exemplary embodiment of the invention, an intensity trough refersto the pairing of a light ramp to a dark ramp, or negative ramp, and adark ramp to a light ramp, or positive ramp, close to each other. Thisrepresents a thin text stroke. While a western alphabet is discussedherein, the invention also handles other alphabets, such as JapaneseKanji. The purpose of this step is to identify these thin strokes sothat in subsequent steps compensation is made for contrast loss as aresult of scanner blurring. The case of intensity ridges, i.e., apositive ramp followed by a negative ramp, is not handled in anembodiment of the invention. Intensity ridges can be added readily ifaccurate detection and enhancement of very thin reverse text isimportant.

In an exemplary embodiment of the invention, identification of intensitytroughs is performed through a finite state machine (FSM) algorithm.Scanned pages are swept from left to right to detect vertical troughsand from top to bottom to detect horizontal troughs. The left to rightsweep is described below. The procedure for the top to bottom sweep issimilar.

For each row in the scanned page, the algorithm starts at state 0 at theleftmost pixel of the row. The sweep procedure sweeps to the right onepixel at a time. The FSM has five possible states, which are listed inTable B below. TABLE B State 0: default (i.e., non-text) State 1: goingdownhill (negative ramping in intensity) State 2: bottom of trough (bodyof text stroke) State 3: going uphill (positive ramping in intensity)State 4: end of uphill (reset)

For each pixel, the signed ramp strength result from Stage 1 is used asan input to the FSM algorithm. For sweeping from left to right, only thevertical ramp strength and the diagonal ramp strength detected in thevertical sense is used. As an option, instead of just using the currentpixel, when the FSM is in state 0, 1, or 4, the input is taken as theminimum of the signed ramp strengths at the current pixel, the pixelabove, and the pixel below. When the FSM is in state 2 or 3, the inputis taken as the corresponding maximum. After the FSM has processed thecurrent input, its new state, which can be unchanged, is assigned as thestate of the current pixel. The rules for state changes are summarizedin the following pseudo-code presented below in Table C. TABLE C if(state=4) state=0; if (state=0 AND input<0) new_state=1, count1=input,count2=1; else if (state=0 AND input>=0) new_state=0; else if (state=1AND input<0) new_state=1, count1+=input, count2++; else if (state=1 ANDinput=0) new_state=1, count2++; else if (state=1 AND input>0)new_state=0; else if (state=2 AND input<0) new_state=2, count1+=input,count2++; else if (state=2 AND input=0) new_state=2, count2++; else if(state=2 AND input>0) new_state=3, count1=input, count2=1,  count3=count2; else if (state=3 AND input<0) new_state=1,count1=input, count2=1; else if (state=3 AND input=0) new_state=3,count2++; else if (state=3 AND input>0) new_state=3, count1+=input,count2++; else new_state=state; if (new_state=1) {  if (count2>4)new_state=0;  else if (count1<=-edge_threshold) new_state=2, count2=1; }else if (new_state=2) {  if (count1>max_ramp_strength) new_state=0; else if (count2>max_width) new_state=0; } else if (new_state=3) {  if(count2>4) new_state=0;  else if (count1>=edge_threshold) new_state=4; }if (new_state=0) count1=0, count2=0, count3=0;

The variable count1 above represents the cumulative ramp strength inunits of T_(ramp)/3. The variable count2 represents the cumulativeduration in pixels of stay in a particular state. The variable count3 isthe total duration of stay in state 2 before switching to states 3 or 4.The threshold, T_(edge) is the cumulative ramp strength required foridentification of a high-contrast edge. Max_ramp_strength is the maximumramping allowed, and Max_width is an upper limit to stroke widths thatcan be detected.

The state at each pixel, its corresponding count2 value, and, if atstate 4, the variable count3, are all passed on to the next step fortext stroke width determination.

The pseudo-code above in Table C represents an exemplary embodiment ofthe invention and is by no means limiting.

FIG. 1B is a graph of an intensity trough 10 according to the invention.State 0 occurs in a first region 11 at intensity value 255. State 1occurs in a second region 12 with negative slope from region 11. State 2occurs in a third region 13 at the bottom of the trough. State 3 occursin a fourth region 14 from the bottom of the trough with positive slopeto up to level 255. State 4 occurs in the fifth region 15 at level 255.

At states 0, 1, and 4, the method looks for downward slopes, i.e.,negative vertical ramps. The algorithm obtains the strongest negativevertical ramp by finding the minimum value among the three negativevertical ramp values. At state 2 and 3, the algorithm looks for upwardslopes, i.e., positive vertical ramps. The method obtains the strongestpositive vertical ramp by finding the maximum value among the threepositive vertical ramp values.

Step 3: Determine Stroke Width

In an exemplary embodiment of the invention, two main tasks areperformed in this step:

(1) Determine the width and the skeleton of the text stroke in which thecurrent pixel is located; and

(2) Detect closely touching text strokes for special treatment in a textenhancement algorithm to avoid merging them.

In another embodiment of the invention, in step (1), a text enhancement(TE) boost flag is also determined. The TE boost flag is apixel-by-pixel indicator. It indicates for each pixel whether or not toreduce the intensity, i.e., boost the ink level within the textenhancement module to compensate for scanner blurring.

The discussion below pertains to task (1).

Referring to FIG. 2, the width of a text stroke at a current pixel 20 isdefined as the smaller of the vertical distance 21 or the horizontaldistance 22 between the two edges 23, 24 of the stroke. Its skeleton 25is a line roughly equidistant from both edges 23, 24. In anotherembodiment of the invention, the TE boost flag is a pixel-by-pixelindicator. It indicates for each pixel whether or not to reduce theintensity, i.e., boost the ink level within the text enhancement moduleto compensate for scanner blurring.

In an exemplary embodiment of the invention, to determine the verticalstroke width 21, Stage 2 results from the left to right sweep, i.e., fordetection of vertical troughs, in a 1×N window beginning at the currentpixel 20, wherein N=9 in the current implementation, are used. Thevertical stroke width 21, the position of the skeleton point in thecurrent window, and, in another embodiment, the TE boost flag, of thecurrent pixel 20 are determined according to the algorithm below inTable D, given in pseudo-code. It is noted that the prefix, v_denotesvertical detection. TABLE D v_skeleton_flag=0; v_te_boost_flag=0;v_crnt_state=v_state[crnt]; v_crnt_count2=v_count2[crnt];v_next_state=v_state[crnt+1]; if (v_crnt_state=2 OR v_next_state=2 ANDv_crnt_state<2 OR   v_crnt_state>2 AND v_crnt_count2<=2) {  for (i=0;i<win_size; i++) {   run_state=v_state[crnt+i];   if (run_state=4) {   v_width=v_count3[crnt+i];    if (v_crnt_state=2 AND     (v_width=1 ORv_crnt_count2=(v_width − v_width/4)))     v_skeleton_flag=1;    if(v_crnt_state=2 AND (v_crnt_count2>1 OR v_width=1) OR   v_crnt_state>2AND     (v_width=1 AND v_crnt_state=3 AND v_crnt_count2=1 OR     v_width>1 AND (v_crnt_count2=1 AND v_crnt_state=3)))    v_te_boost_flag=1;    break;   }  } }

In an exemplary embodiment of the invention, the horizontal stroke width22, skeleton 25, and, in another embodiment, the TE boost flag, aredetermined similarly. The implementation used the top to bottom sweepresults from Stage 2 as input and is performed on pixels in an N×1window at the current pixel 20.

The results from the vertical and horizontal paths are assembled todetermine a single width, skeleton flag, and, in another embodiment, theTE boost flag, at each pixel, the pseudo-code of which is provided inTable E below. TABLE E if (v_width<h_width) {  width=v_width; skeleton_flag=v_skeleton_flag;  te_boost_flag=v_te boost_flag; } elseif (h_width<v_width) {  width=h_width;  skeleton_flag=h_skeleton_flag; te_boost_flag=h_te_boost_flag; } else {  width=v_width; skeleton_flag=v_skeleton_flag OR h_skeleton_flag; te_boost_flag=v_te_boost_flag OR v_te_boost_flag; }

The following pertains to task (2).

An exemplary embodiment of the invention looks for a pattern ofdark-light-dark (DLD) in the horizontal or vertical direction within avery small window. Typically, the DLD pattern occurs mainly in betweentext strokes that became blurred towards one another after scanning. Todetermine the vertical DLD flag, the procedure is as described below.

An N×M window is centered at a current pixel. In an exemplary embodimentof the invention, N=7 and M=5. For each column, divide pixels into threedisjoint groups: top; middle; and bottom. For N=7, use two pixels, threepixels, and two pixels, respectively, for the three groups. A DLDpattern is detected in the column if the difference between the darkestpixel in the top group and the lightest pixel in the middle group, andthe difference between the darkest pixel in the bottom group and thelightest pixel in the middle group are both bigger than a thresholdvalue, T_(dld). Next, the number of DLD detected columns within thewindow are counted. If the count is bigger than a threshold, which, inan exemplary embodiment is two, then the DLD flag of the current pixelis turned on.

In an exemplary embodiment of the invention, the determination of theDLD pattern in the horizontal direction is done similarly, but on an M×Nwindow instead. The final DLD flag is turned on when either a horizontalor a vertical DLD pattern is detected. In an alternative exemplaryembodiment of the invention, the flag is passed to a text enhancementmodule. The module modifies adaptive thresholds used therein to ensurethat enhanced text strokes are cleanly separated from one another.

Step 4: Preliminary Marking of Text Pixels

In an exemplary embodiment of the invention, this step provisionallydecides whether the current pixel is a text pixel based on localcontrast present in a N×N window and the width of the text stroke atthat pixel. The current implementation uses N=9. Then numerousstatistics of the pixels within the N×N window are collected. The listbelow in Table F comprises, but is not limited to, the collectedstatistics: TABLE F Thresholds used: contrast_light_threshold minimumintensity level for text background contrast_dark_threshold maximumintensity (minimum darkness) of text to be detectedboosted_contrast_light_threshold minimum intensity level for textbackground around crowded text strokes (this is smaller thancontrast_light_threshold) medium_threshold around 50% intensitymax_thin_width maximum width of a stroke that is considered thinmax_very_thin_width maximum width of a stroke that is considered verythin Statistics collected: cnt_thin number of pixels that are thin(width<=max_thin_width) cnt_inner_thin number of pixels in the center 3× 3 window that are thin cnt_thin_skeleton number of pixels on askeleton and are very thin (width<=max_very_thin_width) min_widthminimum width among the center 3 × 3 pixels min2_width second smallestwidth among the center 3 × 3 pixels (will be the same as min_width ifmore than 1 pixel has the minimum width) lightest highest intensitypresent in window cnt_light number of light pixels(intensity>=contrast_light_threshold) cnt_non_light number of non-lightpixels (intensity<contrast_light_threshold) cnt_ht number of non-lightpixels detected as half-toned from a half-tone detection modulecnt_dark_neutral number of dark (intensity<=contrast_dark_threshold) andneutral (non-color) pixels cnt_dark_clr number of dark and coloredpixels cnt_other_clr number of colored pixels with medium intensity(contrast_dark_threshold<intensity< contrast_light_threshold)cnt_boosted_dark_neutral number of dark and neutral pixels afterboosting (boosted_intensity<= contrast_dark_threshold)cnt_boosted_dark_clr number of dark and colored pixels after boostingcnt_boosted_other_clr number of colored pixels with medium intensityafter boosting cnt_boosted_light number of light pixels after allowingfor a lower contrast (intensity>=boosted_contrast_light_threshold)cnt_inner_medium number of pixels in the center 3 × 3 window that aredark to medium in intensity (intensity<medium_threshold) thin_flag 1 ifstroke is thin (cnt_inner_thin>3 OR cnt_thin*4>cnt_non_light), 0otherwise bg_flag 1 if the center 3 × 3 pixels are all light(intensity>contrast_light_threshold-16), 0 otherwise

The list of threshold and statistics above in Table F is by no meanslimiting. In other embodiments of the invention, the threshold andstatistics may vary.

In another embodiment of the invention, boosted intensity of a pixel isthe same as original intensity unless width≦max_thin_width. Then, theboosted intensity is the original intensity minus a table look-up valuedepending on the width. The smaller the width, the more that issubtracted from the original intensity.

In an exemplary embodiment of the invention, the current pixel next isdetermined to be in either one of the following four text categories:Text Outline, Text Body, Background, and Non-text. Table G below givesthe algorithm in pseudo-code. It is noted that the thresholds are chosenempirically, i.e., by fine-tuning. TABLE G if (cnt_dark_neutral>2 AND cnt_dark_clr<cnt_dark_clr_threshold AND cnt_other_clr<cnt_other_clr_threshold AND  (cnt_ht*2)<cnt_non_light AND cnt_light>1) { text_tag=TEXT_OUTLINE; } else if (is_thin) {  if(cnt_boosted_dark_neutral>2 AND  cnt_boosted_dark_clr<cnt_dark_clr_threshold AND  cnt_boosted_other_clr<cnt_other_clr_threshold AND  (cnt_ht*2)<cnt_non_light AND   (cnt_light>1 OR   cnt_boosted_light>1AND    cnt_thin_skeleton>many_skeleton_threshold)   ) text_tag=TEXT_OUTLINE; } else if (is_bg) text_tag=BACKGROUND; else if(cnt_inner_medium==9) text_tag=TEXT_BODY; else text_tag=NON_TEXT;

Different sets of criteria based on stroke width can be used in thealgorithm represented in Table G.

In an exemplary embodiment of the invention, after the current windowfinishes processing, its center is moved by J pixels so that asubsampled text tag is determined. That is, each text tag represents thetext category of a J×J block of pixels. In an exemplary embodiment, J=3.

In addition to the text tag, the lightest intensity detected in thewindow is passed as input to the next step. In one embodiment of theinvention, the TE boost flag is widened by turning it on for the entireblock, whenever any of the pixels in the block has its TE boost flag on.

Step 5: Consistency Check and Final Decision

In an exemplary embodiment of the invention, in this step regions withtext tag=Text Outline are widened to ensure no text is missed whilesimultaneously performing a consistency check. An N×N window, with N=5in an exemplary embodiment, of text tags, i.e., N×N blocks, each blockrepresenting J×J pixels, is used to accumulate the statistics shown inTable H below. TABLE H cnt_non_text - number of blocks withtext_tag=NON_TEXT cnt_inner_text - number of blocks among the center 3 ×3 blocks with  text_tag=TEXT_OUTLINE

In an exemplary embodiment of the invention, the final decision of thetext tag is as follows below in Table I. TABLE I if (cnt_inner_text ANDcnt_non_text<=cnt_non_text_threshold) {  if (crnt_text_tag=TEXT_BODY)text_tag=TEXT_BODY;  else text_tag=TEXT_OUTLINE; } else {  if(crnt_text_tag=BACKGROUND) text_tag=BACKGROUND;  else if(crnt_text_tag=TEXT_BODY) text_tag=TEXT_BODY;  else text_tag=NON_TEXT; }

In Table I above, cnt_non_text_threshold is the maximum number ofnon-text blocks allowed in the window to continue to consider the centerblock as text outline. In another embodiment of the invention, thelightest intensity among the lightest pixels in each block of the windowis determined and passed along to a text enhancement module as anestimate of the background intensity level. In the text enhancementmodule, only text outlines are enhanced.

FIG. 3 is a block schematic diagram of an exemplary embodiment of animage processing system 300 that includes a contrast based processingmodule according to the invention. Image information is provided to thesystem 300 as scanned intensity values from a scanner 301 or from memory302, but the invention is not limited to either.

More specifically, the image information is provided either to a localramp detection module 310, or to a module that performs preliminarydetection of text based on contrast and stroke width 320. Output fromthe local ramp detection module 310 is provided to an identification ofintensity troughs module 311. Output from the identification ofintensity troughs module 311 is provided to a determination of strokewidth module 312. Output from the determination of stroke width module312 is provided to the preliminary detection of text based on contrastand stroke width module 320. Output from the preliminary detection oftext based on contrast and stroke width module 320 is provided to aconsistency check module 321. The final output from the system 300,which is the output from the consistency check module 321, is providedfor other modules for further adjustments 350. The final results eitherare stored in memory 351 and then printed 352, or are sent directly forprinting 352.

FIG. 4 a is a flow diagram of a text detection path that includes acontrast based processing component according to the invention. Scannedintensity values are provided as input (401) to the step forpreprocessing for thin text (410). Output is then provided (402) to thestep for processing based on contrast (420). In another embodiment ofthe invention, the input is provided (401) directly to the step forprocessing based on contrast (420), whereby the preprocessing for thintext step (410) is not required.

FIG. 4 b is a flow diagram of an embodiment of the text detection pathof FIG. 4 a in which the preprocessing step (410) is further broken downinto three separate steps (411-413). They are the local ramp detectionstep (411), the identification of intensity troughs step (412), anddetermination of stroke width step (413), respectively. In addition, thecontrast based processing step (420) is further broken down into apreliminary detection of text based on contrast and stroke width step(421) and a consistency check step (422).

Although the invention has been described in detail with reference toexemplary embodiments, persons possessing ordinary skill in the art towhich this invention pertains will appreciate that various modificationsand enhancements may be made without departing from the spirit and scopeof the claims that follow.

1. A method having scanned intensity information as input for detectingtext in a scanned page by observing a very strong contrast in alocalized region between a dark side and a light side, the methodcomprising: determining a stroke width; contrast-based text detectionprocessing; wherein the localized region comprises a substantially sharpedge between the dark side and the light side; and whereby any of blacktext on white background, black text on color background, and white orlight text on a dark background are detected.
 2. The method of claim 1,further comprising measuring a color saturation value and using thevalue to improve detection accuracy, wherein the color saturation valueof the dark side is required to be small.
 3. The method of claim 2,further comprising preliminarily single pixel processing to estimate thecolor saturation value using prior color information provided by thescanner.
 4. The method of claim 1, furthering comprising detecting thepresence of half-tone pixels by using a local indicator to improvedetection accuracy.
 5. The method of claim 4, wherein the half-tonedetection is obtained through an algorithm for half-tone detection. 6.The method of claim 1, wherein the pre-processing further comprises:detecting a local ramp; and identifying an intensity trough.
 7. Themethod of claim 1, wherein the contrast-based text detection processingfurther comprises: detecting text preliminarily based on local contrastand stroke width; and consistency checking.
 8. The method of claim 1,wherein the observing a strong contrast further comprises: detectingtext preliminarily based on local contrast; and consistency checking. 9.The method of claim 6, further comprising: detecting a local ramp;identifying an intensity trough; detecting text preliminarily based oncontrast and stroke width; and consistency checking.
 10. The method ofclaim 6, wherein identifying an intensity trough uses a finite statemachine algorithm, the algorithm having a sweeping procedure.
 11. Themethod of claim 6, wherein the stroke width determination step furthercomprises: determining a width and a skeleton, wherein the width is adistance value and the skeleton is a skeletal line; and detectingclosely touching text strokes.
 12. The method of claim 11, wherein thewidth and skeleton determining step further comprises: setting the widthvalue to the smaller of a vertical distance and a horizontal distancebetween two edges of the stroke; and determining the skeletal line as aroughly equidistant line from the edges.
 13. The method of claim 11,wherein the detecting closely touching text strokes further comprisesdetecting a pattern of dark-light-dark (DLD) in a horizontal or avertical direction within a very small window.
 14. The method of claim7, wherein the detecting text further comprises deciding whether acurrent pixel is a text pixel by using the local contrast present in anN×N window having a center over a set of pixels and centered at thecurrent pixel, and stroke width at the current pixel.
 15. The method ofclaim 14, wherein N=9.
 16. The method of claim 14, wherein numerousstatistics of the pixels within the N×N window are collected by using aset of thresholds.
 17. The method of claim 16, wherein the set ofthresholds comprises any of: a first minimum intensity level for textbackground; a maximum intensity level of text to be detected; a secondminimum intensity level for text background around crowded text strokes,wherein the second minimum intensity level is smaller than the firstminimum intensity level; a medium threshold value, wherein the mediumthreshold value is around 50% intensity; a first maximum width of astroke, wherein the first width is considered thin; and a second maximumwidth of a stroke, wherein the second width is considered very thin; andwherein the numerous statistics comprise any of: a number of pixels thatare thin; a number of pixels in the center of a 3×3 window that arethin; a number of pixels on a skeleton, wherein the skeleton pixels arevery thin; a minimum width among pixels of the center 3×3 pixels; asecond smallest width among the pixels of the center 3×3 pixels, whereinthe second smallest width is equal to the minimum width among pixels ofthe center 3×3 pixels if more than 1 pixel has the minimum width; ahighest intensity present in the N×N window; a number of light pixels; anumber of non-light pixels; a number of non-light pixels detected ashalf-toned from a half-tone detection module; a number of dark andneutral pixels; a number of dark and colored pixels; a number of coloredpixels with medium intensity; a number of dark and neutral pixels afterboosting; a number of pixels in the center 3×3 window, wherein thepixels are dark to medium in intensity; a thin flag set to 1 if thestroke is thin, or set to zero otherwise; and a background flag set to 1if the center 3×3 pixels are all light, or set to zero otherwise. 18.The method of claim 16, further comprising determining if the currentpixel is in a category of a set of predetermined categories using anassociated algorithm and the set of thresholds, wherein the thresholdsare chosen empirically.
 19. The method of claim 18, wherein thepredetermined set of categories comprises: Text Outline; Text Body;Background; and Non-text.
 20. The method of claim 18, further comprisingmoving the center of the N×N window by J pixels to obtain a subsampledtext tag.
 21. The method of claim 20, wherein J=3.
 22. The method ofclaim 7, wherein the consistency checking further comprises:accumulating a set of statistics using an N×N window of text tags and aset of thresholds; and deciding by using the set of statistics if eachof the text tags is any of: Text Outline; Text Body; Background; andNon-text.
 23. The method of claim 22, wherein the N×N window furthercomprises N×N blocks, each block representing J×J pixels.
 24. The methodof claim 23, wherein N=5 and J=3.
 25. The method of claim 22, whereinset the of thresholds comprises a maximum number of Non-text blocksthreshold.
 26. An apparatus for receiving scanned intensity informationas input for detecting text in a scanned page by observing a very strongcontrast in a localized region between a dark side and a light side, theapparatus comprising: a module for pre-processing for stroke widthdetermination; and a module for contrast-based text detectionprocessing; wherein the localized region comprises a substantially sharpedge between the dark side and the light side; and whereby any of blacktext on white background, black text on color background, and white orlight text on a dark background are detected.